## Remote Side-Channel Attacks on Heterogeneous SoC

#### Joseph GRAVELLIER (EMSE) Jean-Max DUTERTRE (EMSE) Yannick TEGLIA (THALES) Philippe LOUBET-MOUNDI (THALES) Francis OLIVIER (THALES)

Laboratoire de Sécurité des Architectures et des Systèmes, F-13541 Gardanne France Thales - 13600 La Ciotat, France

November 2019







Joseph GRAVELLIER

CARDIS 2019

November 2019 1 / 26

### Context

#### Usual Hardware Attacks



- Type: fault injection attack (FIA) & side-channel attack (SCA).
- Target: smart cards, microcontrollers, system on chip...
- Means: oscilloscope, laser, EM probe...
- Range: local, direct physical access required.



### Context

#### Remote Hardware Attacks



- Type: fault injection attack (FIA) & side-channel attack (SCA).
- Range: remote, access to a network required.
- Target: connected devices (IoT), data centers...
- Means: resources available within the target.





- Remote hardware attack topic keeps on gaining in popularity:
  - Emergence of cloud services, IoT, decentralized computing





- Remote hardware attack topic keeps on gaining in popularity:
  - Emergence of cloud services, IoT, decentralized computing





- Remote hardware attack topic keeps on gaining in popularity:
  - Emergence of cloud services, IoT, decentralized computing





- Remote hardware attack topic keeps on gaining in popularity:
  - Emergence of cloud services, IoT, decentralized computing





- Remote hardware attack topic keeps on gaining in popularity:
  - Emergence of cloud services, IoT, decentralized computing





- Remote hardware attack topic keeps on gaining in popularity:
  - Emergence of cloud services, IoT, decentralized computing





- Remote hardware attack topic keeps on gaining in popularity:
  - Emergence of cloud services, IoT, decentralized computing



#### Basics



- Usual hardware attacks can be entirely reproduced within FPGA logic:
  - Encryption **algorithm** implementation.
  - Voltage glitch injector implementation (Krautter et al).
  - Voltage **sensor** implementation (Schellenberg et al).



#### Basics



- Usual hardware attacks can be entirely reproduced within FPGA logic:
  - Encryption **algorithm** implementation.
  - Voltage glitch injector implementation (Krautter et al).
  - Voltage **sensor** implementation (Schellenberg et al).



#### Basics



- Usual hardware attacks can be entirely reproduced within FPGA logic:
  - Encryption **algorithm** implementation.
  - Voltage glitch injector implementation (Krautter et al).
  - Voltage **sensor** implementation (Schellenberg et al).





Threat model and related works

Target: connected devices that embeds FPGAs.

 A Multi-user FPGAs in cloud datacenters (Schellenberg et al).
 B Printed circuit boards PCB (Schellenberg et al).
 C Heterogeneous connected SoCs (Zhao et al).

 A Intra-FPGA Attack B Inter-FPGA Attack C FPGA-to-CPU Attack





Threat model and related works





Threat model and related works

 Target: connected devices that embeds FPGAs. Multi-user FPGAs in **cloud datacenters** (Schellenberg et al). Printed circuit boards **PCB** (Schellenberg et al). В Heterogeneous connected SoCs (Zhao et al). Inter-FPGA Attack Intra-FPGA Attack FPGA-to-CPU Attack FPGANNNNNN PCB SoC PDN crypto CPU FPGA crypto Sensor crypto Senso Sensor

FPGA '



- Already proved:
  - CPU computations can be eavesdropped by FPGA-based sensors.
  - SPA attack on self-written software RSA using ROs.
- Our Goal:
  - Perform FPGA-based CPA attacks against open-source and deployed software AES implementations.





- Already proved:
  - CPU computations can be eavesdropped by FPGA-based sensors.
  - SPA attack on self-written software RSA using ROs.
- Our Goal:
  - Perform FPGA-based CPA attacks against open-source and deployed software AES implementations.





- Iterative implementation:
  - **Test** SCA on **hardware** AES implementation.
  - **Optimize** setup toward SCA on software AES.
  - 3) **Perform** SCA on **software** AES implementations.



#### Goal & Challenges



• Iterative implementation:

I) **Test** SCA on **hardware** AES implementation.

- 2) **Optimize** setup toward SCA on software AES.
- 3) **Perform** SCA on **software** AES implementations.





- Iterative implementation:
  - 1) **Test** SCA on **hardware** AES implementation.
    - 2) **Optimize** setup toward SCA on software AES.
  - (3) **Perform** SCA on **software** AES implementations.





## $\bigcirc 1$ Hardware AES encryption key retrieval.

### 2 FPGA-based SCA **Optimization**.

3) **Software** AES encryption key retrieval.

Introduction to Time-to-Digital Converter (TDC) sensor

- THALES
- Power supply fluctuations  $\Rightarrow$  Propagation delay variations.
- Time-To-Digital converter basics:
  - A *clk* signal propagates through a delay line.
  - A register periodically captures the delay line state.

# 1 Hardware AES encryption key retrieval. THALES



- Target: Xilinx Zynq 7000 heterogeneous SoC
  - FPGA (Xilinx Artix-7) TDC sensors and AES algorithm
  - CPU (ARM Cortex-A9) Traces export and AES management



4 A N



- Target: Xilinx Zynq 7000 heterogeneous SoC
  - FPGA (Xilinx Artix-7) TDC sensors and AES algorithm
  - CPU (ARM Cortex-A9) Traces export and AES management
- Experimental setup:
  - TDCs placed horizontally far away from AES => worst case scenario





- Hardware AES attack
  - Custom VHDL AES designed for the attack.
    - Key size 128 bit, Datapath 128 bit.





- Hardware AES attack
  - Custom VHDL AES designed for the attack.
    - Key size 128 bit, Datapath 128 bit.
    - AES encryption time @10MHz  $\Rightarrow$   $1.1 \mu s$





- Custom VHDL AES designed for the attack.
  - Key size **128 bit**, Datapath **128 bit**.
  - AES encryption time @10MHz  $\Rightarrow$   $1.1 \mu s$
  - Synchronisation  $\Rightarrow$  Encryption and measurement launched **simultaneously**.



Hardware AES attack



- Custom VHDL AES designed for the attack.
  - Key size **128 bit**, Datapath **128 bit**.
  - AES encryption time @10MHz  $\Rightarrow$   $1.1 \mu s$
  - Synchronisation  $\Rightarrow$  Encryption and measurement launched **simultaneously**.
  - − CPA model  $\Rightarrow$  AES Last round  $HW[ARK_9 \oplus ARK_{10}]$



Hardware AES attack



- Hardware AES attack
  - Custom VHDL AES designed for the attack.
    - Key size 128 bit, Datapath 128 bit.
    - AES encryption time @10MHz  $\Rightarrow$   $1.1\mu s$
    - Synchronisation  $\Rightarrow$  Encryption and measurement launched **simultaneously**.
    - − CPA model  $\Rightarrow$  AES Last round  $HW[ARK_9 \oplus ARK_{10}]$



• Results: number of traces required to infer an AES key byte: 4,483.



## 2 FPGA-based SCA **Optimization**.

3) **Software** AES encryption key retrieval.



#### Presentation



- Several levers:
  - Placement: TDCs proximity to the target.
  - Performance: TDCs structure modifications.





Sensor proximimity to the target

- Assumption:
  - Sensor proximity to the target should improve CPA results.
  - Less distance means less acquired noise.



Sensor proximimity to the target

- Assumption:
  - Sensor proximity to the target should improve CPA results.
  - Less distance means less acquired noise.
- Experimental Setup:
  - Far setup: 80 slices between AES & TDCs.
  - Close setup: 6 slices between AES & TDCs.





Sensor proximimity to the target

- Assumption:
  - Sensor proximity to the target should improve CPA results.
  - Less distance means less acquired noise.
- Experimental Setup:
  - Far setup: 80 slices between AES & TDCs.
  - Close setup: 6 slices between AES & TDCs.



• Results: CPA traces required drops from 4,483 to 3,440.

Joseph GRAVELLIER

init delay length / Voltage integration duration

THALES MINES

- Fixed (classic) **init** delay:
  - Add 180° phase shift to form  $\delta \textit{clk}$  signal.
  - Integrates voltage fluctuations during a **half** *clk* period.
- Reconfigurable (new) **init** delay:
  - Add  $n * 180^{\circ}$  phase shift to form  $\delta clk$  signal.
  - Integrates voltage fluctuations during *n* \* **half** *clk* period.



init delay length / Voltage integration duration

THALES

- Fixed (classic) **init** delay:
  - Add 180° phase shift to form  $\delta \textit{clk}$  signal.
  - Integrates voltage fluctuations during a half *clk* period.
- Reconfigurable (new) init delay:
  - Add  $n * 180^{\circ}$  phase shift to form  $\delta clk$  signal.
  - Integrates voltage fluctuations during n \* half clk period.



Reconfigurable init Delay

init delay length / Voltage integration duration

THALES

- Fixed (classic) init delay:
  - Add 180° phase shift to form  $\delta \textit{clk}$  signal.
  - Integrates voltage fluctuations during a **half** *clk* period.
- Reconfigurable (new) init delay:
  - Add  $n * 180^{\circ}$  phase shift to form  $\delta clk$  signal.
  - Integrates voltage fluctuations during n \* half clk period.



#### Reconfigurable *init* Delay

Results: CPA traces required **drops** from 3,440 to 1,381.

| Joseph | GRAV | ELLIER |  |
|--------|------|--------|--|
|--------|------|--------|--|

**Optimization Results & Discussion** 



• Results:

| <b>TDC Calibration</b> | Average number of Traces | <b>Optimization Factor</b> |
|------------------------|--------------------------|----------------------------|
| No                     | 4,483                    | /                          |
| Placement              | 3,440                    | 1,30                       |
| Init + Placement       | 1,381                    | 3,25                       |

э

< □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ > < □ >

**Optimization Results & Discussion** 



• Results:

| <b>TDC Calibration</b> | Average number of Traces | <b>Optimization Factor</b> |
|------------------------|--------------------------|----------------------------|
| No                     | 4,483                    | /                          |
| Placement              | 3,440                    | 1,30                       |
| Init + Placement       | 1,381                    | 3,25                       |

- TDCs calibration is substantial for the following CPU attacks.
  - Low CPU-to-FPGA side-channel leakage.
  - CPU frequency @666MHz >> TDC frequency @200MHz.



1) Hardware AES encryption key retrieval.

(2) FPGA-based SCA **Optimization**.

3) **Software** AES encryption key retrieval.

Experimental Setup



- Two freely-available software AES studied (Bare-metal programming):
  - Tiny AES 128 8 bit data-path.
  - OpenSSL AES 128 32 bit data-path (T-Table)

Experimental Setup



- Two freely-available software AES studied (Bare-metal programming):
  - Tiny AES 128 8 bit data-path.
  - OpenSSL AES 128 32 bit data-path (T-Table)
- Experimental setup:
  - 8 TDCs placed vertically on FPGA left part => make sense according to the implemented view.



# **3 Software** AES encryption key retrieval. **THALES Tiny** AES attack



- $\bullet\,$  Small and portable implementation of the AES written in C.
  - Encryption time @666MHz  $\Rightarrow$  **40** $\mu$ s.



# (3) **Software** AES encryption key retrieval. Tiny AES attack



- Small and portable implementation of the AES written in C.
  - Encryption time @666MHz  $\Rightarrow$  **40** $\mu$ s.
  - − CPA model  $\Rightarrow$  AES First round Sbox:  $HW[Sbox(k \oplus m)]$ .



# (3) **Software** AES encryption key retrieval. Tiny AES attack



- Small and portable implementation of the AES written in C.
  - Encryption time @666MHz  $\Rightarrow$  **40** $\mu$ s.
  - − CPA model  $\Rightarrow$  AES First round Sbox:  $HW[Sbox(k \oplus m)]$ .



• Number of traces required to infer an AES key byte: 111,758.

## **3 Software** AES encryption key retrieval. **THALES** OpenSSL attack



- Crypto library used for secure channels over computer networks.
  - Datapath **32 bit** (T-table).



#### **3** Software AES encryption key retrieval. OpenSSL attack



- Crypto library used for secure channels over computer networks.
  - Datapath **32 bit** (T-table).
  - AES encryption time @666MHz  $\Rightarrow$  **2.90** $\mu$ s





- Crypto library used for secure channels over computer networks.
  - Datapath **32 bit** (T-table).
  - AES encryption time @666MHz  $\Rightarrow$  **2.90** $\mu$ s
  - − CPA model  $\Rightarrow$  AES First round *Sbox*:  $HW[Sbox(k \oplus m)]$





- Crypto library used for secure channels over computer networks.
  - Datapath **32 bit** (T-table).
  - AES encryption time @666MHz  $\Rightarrow$  **2.90** $\mu$ s
  - − CPA model  $\Rightarrow$  AES First round *Sbox*:  $HW[Sbox(k \oplus m)]$



• Number of traces required to infer an AES key byte: 127,558.



- Crypto library used for secure channels over computer networks.
  - Datapath **32 bit** (T-table).
  - AES encryption time @666MHz  $\Rightarrow$  **2.90** $\mu$ s
  - − CPA model  $\Rightarrow$  AES First round *Sbox*:  $HW[Sbox(k \oplus m)]$



- Number of traces required to infer an AES key byte: 127,558.
- Improved results with T-table model: 87,422 traces.

- THALES
- Goal: challenge TDC results regarding classical SCA.
- Experimental Setup:
  - Probe: Langer ICR HH 150
  - Oscilloscope Sampling Rate: 5 GS/s



- THALES
- Goal: challenge TDC results regarding classical SCA.
- Experimental Setup:
  - Probe: Langer ICR HH 150
  - Oscilloscope Sampling Rate: 5 GS/s
- Two hotspots:
  - 1
    ight) Best results for hardware AES algorithms. (FPGA)
  - 2) Best results for software AES algorithms. (CPU)



THALES MINES Saint-Étienne

- CEMA conducted against each AES studied.
  - Osc sampling rate (5 GS/s) >> TDC sampling rate (200 MS/s).
  - Osc resolution >> TDC resolution

THALES MINES Saint-Étienne

- CEMA conducted against each AES studied.
  - Osc sampling rate (5 GS/s) >> TDC sampling rate (200 MS/s).
  - Osc resolution >> TDC resolution
- Results:

| Setup | HAES  | Tiny AES | OpenSSL 1 | OpenSSL 2 |
|-------|-------|----------|-----------|-----------|
| EM    | 1,021 | 52,438   | 106,225   | 88,412    |
| TDC   | 1,381 | 111,758  | 127,558   | 87,422    |



- CEMA conducted against each AES studied.
  - Osc sampling rate (5 GS/s) >> TDC sampling rate (200 MS/s).
  - Osc resolution >> TDC resolution
- Results:

| Setup | HAES  | Tiny AES | OpenSSL 1 | OpenSSL 2 |
|-------|-------|----------|-----------|-----------|
| EM    | 1,021 | 52,438   | 106,225   | 88,412    |
| TDC   | 1,381 | 111,758  | 127,558   | 87,422    |

- TDCs provide similar results to local side-channel:
  - Side-channel leakage behaviour.
  - TDC calibration (position, delay).

Conclusion



• FPGA-to-CPU statistical SCA attacks are practicable.



- To do list:
  - TDC in-depth study (shape, number, chip...)
  - TDC against side-channel countermeasures (shuffling, masking, random delays, jitter, etc).



### Thank you! Questions?

#### joseph.gravellier@emse.fr

Joseph GRAVELLIER

CARDIS 2019

November 2019 26 / 26

▲
 ▲